126 research outputs found

    Asymptotic Theory for Random Forests

    Full text link
    Random forests have proven to be reliable predictive algorithms in many application areas. Not much is known, however, about the statistical properties of random forests. Several authors have established conditions under which their predictions are consistent, but these results do not provide practical estimates of random forest errors. In this paper, we analyze a random forest model based on subsampling, and show that random forest predictions are asymptotically normal provided that the subsample size s scales as s(n)/n = o(log(n)^{-d}), where n is the number of training examples and d is the number of features. Moreover, we show that the asymptotic variance can consistently be estimated using an infinitesimal jackknife for bagged ensembles recently proposed by Efron (2014). In other words, our results let us both characterize and estimate the error-distribution of random forest predictions, thus taking a step towards making random forests tools for statistical inference instead of just black-box predictive algorithms.Comment: This manuscript is superseded by "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests" by Wager and Athey (arXiv:1510.04342). The new paper extends the asymptotic theory developed here, and applies it to causal inference in the potential outcomes framework with unconfoundedness. The present version is maintained online for archival purposes onl

    The Efficiency of Density Deconvolution

    Full text link
    The density deconvolution problem involves recovering a target density g from a sample that has been corrupted by noise. From the perspective of Le Cam's local asymptotic normality theory, we show that non-parametric density deconvolution with Gaussian noise behaves similarly to a low-dimensional parametric problem that can easily be solved by maximum likelihood. This framework allows us to give a simple account of the statistical efficiency of density deconvolution and to concisely describe the effect of Gaussian noise on our ability to estimate g, all while relying on classical maximum likelihood theory instead of the kernel estimators typically used to study density deconvolution

    Subsampling Extremes: From Block Maxima to Smooth Tail Estimation

    Full text link
    We study a new estimator for the tail index of a distribution in the Frechet domain of attraction that arises naturally by computing subsample maxima. This estimator is equivalent to taking a U-statistic over a Hill estimator with two order statistics. The estimator presents multiple advantages over the Hill estimator. In particular, it has asymptotically smooth sample paths as a function of the threshold k, making it considerably more stable than the Hill estimator. The estimator also admits a simple and intuitive threshold selection rule that does not require fitting a second-order model. Journal of Multivariate Analysis, 130, 2014Comment: Added reference

    Quasi-Oracle Estimation of Heterogeneous Treatment Effects

    Full text link
    Flexible estimation of heterogeneous treatment effects lies at the heart of many statistical challenges, such as personalized medicine and optimal resource allocation. In this paper, we develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. We first estimate marginal effects and treatment propensities in order to form an objective function that isolates the causal component of the signal. Then, we optimize this data-adaptive objective function. Our approach has several advantages over existing methods. From a practical perspective, our method is flexible and easy to use: In both steps, we can use any loss-minimization method, e.g., penalized regression, deep neural networks, or boosting; moreover, these methods can be fine-tuned by cross validation. Meanwhile, in the case of penalized kernel regression, we show that our method has a quasi-oracle property: Even if the pilot estimates for marginal effects and treatment propensities are not particularly accurate, we achieve the same error bounds as an oracle who has a priori knowledge of these two nuisance components. We implement variants of our approach based on penalized regression, kernel ridge regression, and boosting in a variety of simulation setups, and find promising performance relative to existing baselines.Comment: Biometrika, forthcomin

    Experimenting in Equilibrium

    Full text link
    Classical approaches to experimental design assume that intervening on one unit does not affect other units. There are many important settings, however, where this non-interference assumption does not hold, as when running experiments on supply-side incentives on a ride-sharing platform or subsidies in an energy marketplace. In this paper, we introduce a new approach to experimental design in large-scale stochastic systems with considerable cross-unit interference, under an assumption that the interference is structured enough that it can be captured via mean-field modeling. Our approach enables us to accurately estimate the effect of small changes to system parameters by combining unobstrusive randomization with lightweight modeling, all while remaining in equilibrium. We can then use these estimates to optimize the system by gradient descent. Concretely, we focus on the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and show that our approach enables the platform to optimize p in large systems using vanishingly small perturbations.Comment: Forthcoming in Management Scienc

    Adaptive Concentration of Regression Trees, with Application to Random Forests

    Full text link
    We study the convergence of the predictive surface of regression trees and forests. To support our analysis we introduce a notion of adaptive concentration for regression trees. This approach breaks tree training into a model selection phase in which we pick the tree splits, followed by a model fitting phase where we find the best regression model consistent with these splits. We then show that the fitted regression tree concentrates around the optimal predictor with the same splits: as d and n get large, the discrepancy is with high probability bounded on the order of sqrt(log(d) log(n)/k) uniformly over the whole regression surface, where d is the dimension of the feature space, n is the number of training examples, and k is the minimum leaf size for each tree. We also provide rate-matching lower bounds for this adaptive concentration statement. From a practical perspective, our result enables us to prove consistency results for adaptively grown forests in high dimensions, and to carry out valid post-selection inference in the sense of Berk et al. [2013] for subgroups defined by tree leaves

    Confidence Intervals for Nonparametric Empirical Bayes Analysis

    Full text link
    In an empirical Bayes analysis, we use data from repeated sampling to imitate inferences made by an oracle Bayesian with extensive knowledge of the data-generating distribution. Existing results provide a comprehensive characterization of when and why empirical Bayes point estimates accurately recover oracle Bayes behavior. In this paper, we develop flexible and practical confidence intervals that provide asymptotic frequentist coverage of empirical Bayes estimands, such as the posterior mean or the local false sign rate. The coverage statements hold even when the estimands are only partially identified or when empirical Bayes point estimates converge very slowly

    High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification

    Full text link
    We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime where p,nβ†’βˆžp, n \to \infty and p/nβ†’Ξ³βˆˆ(0,β€‰βˆž)p/n \to \gamma \in (0, \, \infty), and allow for arbitrary covariance among the features. For both methods, we provide an explicit and efficiently computable expression for the limiting predictive risk, which depends only on the spectrum of the feature-covariance matrix, the signal strength, and the aspect ratio Ξ³\gamma. Especially in the case of regularized discriminant analysis, we find that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix, suggesting that analyses based on the operator norm of the covariance matrix may not be sharp. Our results also uncover several qualitative insights about both methods: for example, with ridge regression, there is an exact inverse relation between the limiting predictive risk and the limiting estimation risk given a fixed signal strength. Our analysis builds on recent advances in random matrix theory.Comment: Added a section on prediction versus estimation for ridge regression. Rewrote introduction. Other results unchange

    Semiparametric Exponential Families for Heavy-Tailed Data

    Full text link
    We propose a semiparametric method for fitting the tail of a heavy-tailed population given a relatively small sample from that population and a larger sample from a related background population. We model the tail of the small sample as an exponential tilt of the better-observed large-sample tail, using a robust sufficient statistic motivated by extreme value theory. In particular, our method induces an estimator of the small-population mean, and we give theoretical and empirical evidence that this estimator outperforms methods that do not use the background sample. We demonstrate substantial efficiency gains over competing methods in simulation and on data from a large controlled experiment conducted by Facebook.Comment: To appear in Biometrik

    Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

    Full text link
    Many scientific and engineering challenges -- ranging from personalized medicine to customized marketing recommendations -- require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.Comment: To appear in the Journal of the American Statistical Association. Part of the results developed in this paper were made available as an earlier technical report "Asymptotic Theory for Random Forests", available at (arXiv:1405.0352
    • …
    corecore